-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type key hash tweaks. #61234
Type key hash tweaks. #61234
Conversation
CC: @davidwrighton |
@jkotas - maybe you can take a look at this PR too? (also related to dictionaries in the loader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -166,7 +166,7 @@ class PendingTypeLoadEntry | |||
} | |||
#endif //DACCESS_COMPILE | |||
|
|||
TypeKey GetTypeKey() | |||
TypeKey& GetTypeKey() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCC wanted &
here, since we pass the result by reference. Not sure how it worked with MSVC.
TL;DR:
This change considerably reduces how often we need to compare types to resolve hash collisions in
PendingTypeLoadTable
andEETypeHashTable
dictionaries.More details:
While working on hackathon project I noticed that
HashKey::ComputeHash
is not the best. - It mixes into the hash the number of generic arguments, which does not differ between instantiations of the same type and as such does not improve uniqueness of the hash, while it does not mix the arguments themselves, meaning all instantiations of a given type will have the same hash.While looking at how to improve
HashKey::ComputeHash
, I've realized that onlyPendingTypeLoadTable
uses that and otherwise we useHashTypeKey()
, which is a better hash function. So I switchedPendingTypeLoadTable
to useHashTypeKey
as well.This reduced the number of collisions in
PendingTypeLoadTable
:HelloWorld: 190 -> 154 (20% reduction in something that is not using generics much)
System.Linq.Expressions.Tests: ~5000 -> ~2900 (number varies between runs, but I see about 40% reduction)
I have also noticed that the
HashTypeKey()
computes hash of instantiated types by recursively hashing 2 levels of typedefs, which still can easily cause collisions if instantiation differences are one level lower. Considering that typehandles of type arguments are unique, simply hashing type argument pointers would produce much better hash.It would be cheaper too. Hashing a linearly laid out sequence of pointers would touch a lot less memory than recursive walk through their parts.
With this change I see the number of collisions in
PendingTypeLoadTable
:HelloWorld: 0 collisions (190 -> 0 overall reduction.)
System.Linq.Expressions.Tests: ~300 collisions. (5000 -> 300, ~95% reduction overall)
Out of curiosity I have instrumented the
EETypeHashTable::FindItem
- to see how the change impacts collision rate on the read path.On System.Linq.Expressions.Tests I see:
before change: ~500000 - 700000 collisions.
after change: 0 - 5 collisions
NOTE:
EETypeHashTable
has 2 levels of collision resolution.There is an upper level (in
FindItem
) that deals with poor hashing when differentTypeKey
s get the same hashcode - these are relatively expensive since we resolve them by comparing the type's constituent parts - is it the same module, the same number of type args, is it the same definition, have actually the same typeargs, etc... This is the kind of resolution that was reduced in this change up to 100000x times.There is a lower level (
BaseFindFirstEntryByHash
,BaseFindNextEntryByHash
) that deals with bucketization collisions that must happen whenint32
hash is mapped to a smaller number of buckets (the table uses a typical "mod prime" hash reducer). I see roughly the same number of hash comparisons before or after this change, as expected, since the change does not change the bucketing strategy.